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Abstract 


We  present  progress  toward  an  algorithm  that  provides  short  certificates  of  unsatisfiability 
with  high  probability  when  inputs  are  random  instances  of  3-SAT.  Such  an  algorithm  would 
incorporate  an  approximation  algorithm  A  for  the  3-Hitting  Set  problem.  Using  A  it  would 
determine  an  approximation  for  the  minimum  fraction  of  variables  that  must  be  set  to  true 
(false)  in  order  to  satisfy  the  positive  (negative)  clauses.  If  the  fraction  is  high  enough,  then  the 
instance  is  deemed  unsatisfiable. 
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1  Introduction 

It  is  well  known  that  the  problem  of  determining  the  existence  of  a  satisfying  truth  assignment 
for  a  given  propositional  formula  in  Conjunctive  Normal  Form  (CNF)  is  NP-complete.  If  clauses 
have  exactly  three  literals  each,  the  problem  is  called  3-SAT  and  this  problem  is  also  NP-complete. 
However,  there  exist  polynomial  time  algorithms  that,  under  certain  circumstances,  can  produce  a 
solution  to  a  random  satisfiable  instance  of  3-SAT  with  high  probability.  This  paper  is  concerned 
with  the  question  of  the  existence  of  a  polynomial  time  algorithm  that,  with  high  probability, 
verifies  the  unsatisfiability  of  a  random  unsatisfiable  instance  of  3-SAT. 

Let  I  be  a  random  CNF  Boolean  expression  where  each  of  m  clauses  has  exactly  3  literals  taken 
uniformly  and  independently  from  a  set  V  of  n  Boolean  variables  and  complemented  independently 
with  probability  1/2.  Below,  we  refer  to  this  model  of  generation  of  arandom  instance  as  M(m,  n,  3). 
Suppose  m/n  is  held  constant  as  m  and  n  tend  to  oo.  A  series  of  papers  [7,  3,  5,  10]  ended  with 
the  currently  best  result  ([10])  that  I  is  unsatisfiable,  with  probability  tending  to  1,  if  m/n  >  4.75. 
Another  series  of  papers  [1,  2,  3,  8]  ended  with  the  currently  best  result  ([8])  that  1  is  satisfiable, 
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and  a  satisfying  solution  to  I  may  be  found  in  polynomial  time,  with  probability  tending  to  1, 
if  m/n  <  3.003.  Thus,  random  satisfiable  instances  of  3-SAT  are  usually  easily  solved  for  all  but 
a  small  range  of  values  of  the  ratio  m/n  generating  mostly  satisfiable  instances.  On  the  other 
hand,  there  is  no  known  polynomial  time  algorithm  that  almost  always  verifies  unsatisfiablity 
when  m/n  is  a  constant  greater  than  4.75.  Moreover,  in  [4]  it  is  shown  that  resolution  (therefore, 
other  well  known  methods  for  solving  Satisfiability  including  the  Davis-Putnam  procedure)  requires 
exponential  time,  almost  always,  to  verify  unsatisfiability  for  all  constant  m/n  >  4.75. 

A  positive  result  for  verifying  unsatisfiability,  if  one  exists,  is  clearly  much  tougher  to  find  than 
the  positive  results  for  determining  satisfiability  cited  above.  A  reasonable  candidate  algorithm 
probably  should  avoid  a  search  over  many  truth  assignments  to  determine  that  none  will  satisfy 
a  given  instance.  This  paper  presents  a  reasonable  strategy.  The  idea  is  to  recast  3-SAT  as  a 
3-Hitting  Set  problem  and  use  an  approximation  algorithm  for  the  3-Hitting  Set  problem  to  prove 
unsatisfiability.  An  instance  of  the  3-Hitting  Set  problem  is  a  set  5  of  atoms  and  a  collection  of 
triples  T  =  {T  :T  C  S,\T\  =  3}.  The  problem  is  to  find  the  minimum  S'  C  S  such  that  for  every 
T  €  T,  there  is  an  s  €  T  which  is  also  in  S'.  Any  subset  S"  C  S  that  satisfies  the  above  condition 
is  called  a  hitting  set,  and  S'  is  an  optimal  hitting  set.  We  present  the  mechanics  of  the  method, 
demonstrate  its  feasibility,  and  show  how  close  we  have  come  to  its  realization. 


2  Unsatisfiability  as  a  3-Hitting  Set  problem 

The  idea  is  as  follows.  Given  a  random  instance  I  of  3-SAT,  keep  only  the  positive  and  negative 
clauses  (those  that  have  all  literals  positive  or  all  literals  negative).  Determine  the  minimum  number 
of  variables  that  must  be  set  to  true  to  satisfy  the  positive  clauses.  Determine  the  minimum  number 
of  variables  that  must  be  set  to  false  to  satisfy  the  negative  clauses.  If  the  sum  of  the  two  numbers 
is  greater  than  n,  then  at  least  one  variable  must  be  set  to  true  and  false  if  all  the  positive  and 
negative  clauses  are  to  be  satisfied.  Since  this  is  impossible,  I  must  be  unsatisfiable  if  the  sum  is 
greater  than  n. 

The  problem  of  determining  the  minimum  number  of  variables  that  must  be  set  to  true  or 
false  is  equivalent  to  a  3-Hitting  Set  problem  where  the  given  set  of  atoms  is  the  set  of  variables 
and  the  sets  composed  from  atoms  are  the  clauses.  Unfortunately,  the  3-Hitting  Set  problem  is 
NP-complete.  However,  if  there  is  an  approximation  algorithm  for  3-Hitting  Set  with  a  certain 
performance  guarantee,  then  it  is  possible  to  decide  unsatisfiability  anyway.  The  sections  below 
discuss  the  liklihood  that  such  an  approximation  algorithm  exists  and  show,  if  one  does  exist,  how 
to  use  it  assuming  inputs  are  generated  according  to  M(m,n, 3). 


3  Properties  of  random  3-SAT  instances 

This  section  develops  the  probabilistic  analysis  of  random  instances  of  3-SAT  generated  according  to 
M(n,  m,  3)  and  shows  how  a  polynomial  time  approximation  algorithm  for  an  optimization  problem 
known  as  3-Hitting  Set  can  be  used  to  verify  unsatisfiability  with  probability  tending  to  1. 
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Lemma  1:  With  probability  tending  to  1,  for  any  e  >  0,  an  instance  I  of  3-SAT  generated  according 
to  M(n,m,  3)  has  at  least  (m/8)(l  —  e)  negative  clauses  and  at  least  (m/8)(l  -  e)  positive  clauses. 

Proof:  It  is  sufficient  to  prove  the  hypothesis  for  positive  clauses  only  since  the  result  for  negative 
clauses  is  identical  and,  if  the  probability  of  two  events  tends  to  1,  then  the  probability  of  the 
intersection  of  those  two  events  also  tends  to  1.  The  probability  that  I  has  at  least  r  positive 
clauses  is  given  by 

Pr(l  has  >  r  positive  clauses)  =  |  J  (l/8)fc(7/8)m-fc. 

k—r  \  1 

This  is  a  binomial  distribution  with  mean  m/8.  Setting  r  =  (m/8)(l  —  e)  and  using  the  well-known 
Chernoff  bound  on  the  lower  tail  of  a  binomial  distribution  we  can  bound  the  sum  from  below  as 
follows: 

Pr(l  has  >  (m/8)(l  —  e)  positive  clauses)  >  1  —  e-(e)2(m/8)/2. 

This  tends  to  1  with  increasing  m  and  the  lemma  is  proved.  □ 

Lemma  2:  With  probability  tending  to  1,  the  minimum  fraction  a  of  variables  that  must  be  set 
to  true  (false)  to  satisfy  all  the  positive  (negative)  clauses  of  T  is  at  least  the  value  given  by 

O  _  m/n  _  8  qln(q)  +  (l-q)ln(l-a) 

U  m/n  |  /1  /1  \3\ 

1  -  e  ln(l  -  (1  -  a)3) 


Proof:  Consider  only  the  positive  clauses  as  the  case  of  negative  clauses  is  identical.  Let  V\a)  — 
{t71?  v2,  •••,  be  a  random  subset  of  [cmj  variables  taken  from  V.  The  probability  that  setting 

only  variables  in  V\a )  to  true  satisfies  r  positive  clauses  is 


1  - 


The  average  number 


This  is  an  upper  bound  on  the  probability  that  there  exists  a  set  V\a)  that  satisfies  the  positive 
clauses.  We  need  to  find  the  maximum  a  for  which  this  expression  tends  to  0.  Simplifying  by  using 
Stirling’s  approximation  for  factorials,  and  substituting  (m/8)(l  -  e)  for  r  since,  from  Lemma  1, 
we  have  at  least  that  many  positive  clauses,  we  need  to  find  the  maximum  a  such  that 

(1  -  (1  -  o)3)d-‘)(*/8)  /(I  _  (1  _  a)f-£)(m/n)/8 

aan(  1  -  a)(1-“)Tl  “  y  aa(l  -  a)(1_“) 


This  is  satisfied  if 


8  aln(a)  -f  (1  —  a) ln(l  —  a) 
ln(l  -  (1  -  a)3) 


/3  =  m/n  > 


1  -€ 
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The  lemma  follows.  □ 

A  result  similar  to  Lemma  2  is  proved  in  [6]. 


From  Lemma  2,  with  probability  tending  to  1,  if  f3  >  41.52,  then  the  number  of  variables  that 
must  be  set  to  true  to  satisfy  the  positive  clauses  and  the  number  of  variables  that  must  be  set  to 
false  to  satisfy  the  negative  clauses  both  must  be  greater  than  n/2.  Consequently,  with  probability 
tending  to  1  and  f3  >  41.52,  a  random  instance  of  3-SAT  is  not  satisfiable  and  that  fact  may  be 
verified  in  polynomial  time  if  there  is  a  fast  algorithm  for  finding  the  minimum  number  of  true 
(false)  variables  needed  to  satisfy  the  positive  (negative)  clauses. 

The  problem  of  finding  the  minimum  number  of  true  variables  needed  to  satisfy  the  positive 
clauses  is  equivalent  to  the  problem  of  finding  a  minimum  3-Hitting  Set  for  a  collection  of  triples 
that  is  in  one-one  correspondence  with  the  clauses.  Although  this  problem  is  NP-complete,  if 
there  is  a  good  enough  polynomial  time  approximation  algorithm  for  3-Hitting  Set,  we  can  use  it 
to  reliably  verify  unsatisfiability  in  polynomial  time  for  large,  but  constant  ratios  13.  By  reliably 
verify  unsatisfiability  in  polynomial  time  we  mean  the  algorithm  provides  a  polynomial  time  test  of 
unsatisfiability  which,  if  successful,  proves  a  given  instance  is  unsatisiiable  and  is  not  successful  with 
probability  tending  to  0.  The  question  of  precisely  how  good  such  an  approximation  algorithm  must 
be  to  reliably  verify  unsatisfiability  in  polynomial  time  is  answered  after  the  following  discussion. 


Suppose  there  is  a  polynomial  time  approximation  algorithm  A,  for  a  3-Hitting  Set  instance  Ti 
with  m  triples  taken  from  n  atoms,  that  has  the  following  approximation  property:  if  the  minimum 
hitting  set  for  TL  is  less  than  n/2,  then  A  returns  a  hitting  set  of  no  more  than  777,4(777,  n)  elements. 
We  can  apply  A  to  TL  and,  if  (3  is  big  enough  to  make  an  >  777.4(777, n),  then,  with  probability 
tending  to  1,  the  minimum  hitting  set  for  H  is  greater  than  777,4(777,  n ).  Since  A  is  an  approximation 
algorithm,  it  returns  a  hitting  set  for  Ti  of  size  greater  than  717^  (rrt,  n),  with  probability  tending  to 
1.  Due  to  the  approximation  property  of  A,  a  returned  hitting  set  of  size  greater  than  n7^(m,n) 
is  not  possible  if  the  minimum  hitting  set  for  Ti  is  of  size  less  than  or  equal  to  n/2.  Hence,  with 
probability  tending  to  1,  for  large  enough  /?,  A  can  be  used  to  determine  whether  a  set  of  positive 
clauses  taken  from  a  random  instance  of  3-SAT  requires  more  than  n/2  true  variables  to  be  satisfied. 
It  follows  that  A  can  be  used  to  reliably  verify  unsatisfiability  in  polynomial  time  for  large  enough 
/?.  It  remains  to  determine  what  7,4  needs  to  be  in  order  to  support  the  above  observation  for 
constant  f3  under  model  M(m,n,  3). 


Theorem  3:  Let  I  be  an  instance  of  3-SAT  generated  from  M(m,n,  3)  and  let  (3  be  the  limiting 
ratio  of  777/77  and  suppose  (3  >  41.52.  Let  A  be  a  polynomial  time  approximation  algorithm 
for  3-Hitting  Set  with  7,4(771,  n)  performance  guarantee.  Suppose  there  exists  a  function  c((3), 
1/2  <  c((3)  <  1,  and  c(f3 )  decreases  with  increasing  (3 ,  such  that,  for  very  small  e  >  0, 


lA(m,n)  <  1  - 


/cQ9)Hg(l-e)/8) 

H 1-O/8 


Then,  with  probability  tending  to  1,  A  verifies  that  I  is  unsatisiiable. 


Proof:  We  need  to  show  that  the  right  side  of  the  equation  of  Lemma  2  is  less  than  f3  after 
substituting  7 ,4  for  a.  First,  we  do  this  with  c((3)  =  1  to  provide  an  upper  bound  good  for  all  (3. 
We  assume  7,4  >  1/2  since  otherwise  the  theorem  follows  immediately.  In  what  follows  we  ignore 
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e,  which  is  assumed  to  be  a  constant  very  close  to  0,  for  simplicity. 


P  >  8  (by  hypothesis) 

(1  ~  1A) 

w  fay/s) ) 

>  8  ^  (by  substitution) 

=  8ln(ln(^/8))  -21n(l-7^)  (by  simplification) 

(1  -iaY 

>  ~  1a)  ~  ln(l  -  7 a)  ^gince  (!  _  ar) ln(l  —  x)  <  x  ln(z)  if  a;  >  1/2) 
(1-7,4) 

=  8  ^uM7.4)  (1 — (multiply  top  and  bottom  by  1  —  74) 

(1  -  7 AU 

.  o7^M7^)  +  (1-7^)M1-7^)  /•  ,nri  _w  .a 

>  8  ln(l  -  (1  —  1a)3)  C^ce  ln(l  x)  <  x). 

Since  the  last  expression  is  the  right  side  of  the  equation  of  Lemma  2  with  74  substituted  for  a, 
we  have  7 4  <  a. 

Next,  we  show  that  c(/3)  =  1/2  is  sufficient  when  (3  is  large. 


P  >  8 


>  8 


=  8 


(l/2)lnQ3/8) 


(1  -  7.a)2 
(1/2)  ^((1/2)$%^) 
(l-7^)2 


(by  hypothesis) 


(by  substitution) 

(1/2)  ln((l/2)  ln(/?/8))  —  ln(l  —  74) 


(by  simplification) 


(l-7^)2 

>  8  ~  ^)\~  1"—:y—  (f°r  P  —*  oo»  (1/2)  lnln(y7/8)  >  1  >  — 7.4  ln(7^)/(l  —  7.4)) 

,  -7 A  ln(7^)  -  (1  -  7>0  ln(!  “  7-4) 


=  8 


>  8 


(1-7.4)3 
7^1n(7^)  +  (l-7^)ln(l-7^) 
ln(l  -  (1  -  7>03) 


(multiply  top  and  bottom  by  1  —  7.4) 
(since  ln(l  -  x)  <  - x ). 


□ 


4  An  approximation  algorithm  for  3-Hitting  Set 

In  this  section  we  present  an  approximation  algorithm  A  for  3-Hitting  Set  with  7 ^(m,  n)  <  1  — 
2\/3/(9\//?)  where  m  is  the  number  of  triples  and  n  is  the  number  of  atoms  from  which  triples 
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A(T): 

Input:  Instance  (Q,  A)  of  3- Hitting  Set:  Q  is  a  collection  of  sets  taken  from  the  atom  set  A; 

Output:  A  hitting  set  for  (Q,  A); 

1.  Set  S  =  3\6\/\A\. 

2.  Set  T  =  0. 

3.  Repeat  the  following  as  long  as  S  >  1. 

(a)  Choose  an  atom  a  £  A  such  that  S(a )  >  1. 

(b)  Set  T  =  T  U  {a}. 

(c)  Set  A  —  A  —  {a}. 

(d)  Set  Q  =  Q  -  {B  :  B  €  Q  and  a  £  B}. 

(e)  Set  S  =  Z\G\/\A\ 

4.  Repeat  the  following  as  long  as  Q  =  ft. 

(a)  Choose  a  set  B  £  Q. 

(b)  Choose  an  atom  a  £  B. 

(c)  Set  T  =  T  U  {a}. 

(d)  Set  Q  =  Q  -  {B  :  B  £  G  and  a  £  B}. 

5.  Output  T. 


Figure  1:  Algorithm  for  finding  3-covers  of  3-cover  graphs 


are  taken.  The  algorithm,  presented  in  Figure  1,  is  related  to  but  weaker  than  the  obvious  greedy 
method  as  it  only  selects  atoms  that  occur  an  average  number  of  times  among  remaining  sets 
instead  of  the  maximum  number  of  times.  Although  7 ^(m,  n )  is  not  enough  to  satisfy  Theorem  3, 
we  note  that  it  is  fairly  close  to  what  is  needed.  It  is  possible  that  a  similar  algorithm  with  a  more 
accurate  analysis  will  yield  the  required  approximation  result. 

An  unusual  operation  performed  within  A  is  computing  the  average  number  of  sets  containing 
a  particular  atom.  Let  A  denote  a  set  of  atoms  and  G  denote  a  collection  of  3-subsets  of  A ,  and 
(G,  A)  denote  an  instance  of  3-Hitting  Set.  Let  S(a)  denote  the  number  of  sets  in  Q  containing 
atom  a.  The  average  number  of  sets  containing  a  particular  atom  is  S  =  f2aeA  ■S'(a)/I^l-  At 
outset,  S  =  3m/ n.  Upon  every  iteration  of  the  main  loop  of  A  the  sets  containing  one  of  the  more 
frequently  occurring  atoms  are  eliminated.  This  lowers  S  somewhat.  However,  computing  S  does 
not  change  from  one  iteration  to  the  next:  take  the  product  of  3  and  the  ratio  of  number  of  sets  to 
number  of  atoms  not  yet  considered. 

Theorem  4:  Algorithm  A  always  returns  a  hitting  set  for  ( G,A )  and  runs  in  time  bounded  by  a 
polynomial  in  \G\  and  |A|. 
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Proof:  We  show  the  following  loop  invariant  holds  prior  to  each  step  of  either  loop:  T  is  a  hitting 
set  for  all  sets  eliminated  from  the  original  Q  up  to  the  present.  Clearly,  this  is  true  for  the  first 
iteration  of  the  first  loop.  On  every  succeeding  iteration,  every  eliminated  set  has  at  least  one  of 
its  atoms  placed  in  T.  Correctness  follows. 

The  total  number  of  iterations  is  at  most  |A|.  Each  iteration  takes  0(|f/|)  time.  Hence,  A  runs 
in  polynomial  time.  □ 

Theorem  5:  7 ^(m,n)  <  1  -  2\/3/(9v//?)- 

Proof:  At  iteration  k  of  Step  3,  the  number  of  atoms  remaining  is  n  —  k  and  the  number  of  sets 
remaining  is  no  greater  than  77z  (1  —  3/(n  -  i)).  Hence,  at  iteration  k  of  Step  3, 


We  find  the  value  of  k  that  makes  the  right  side  of  the  inequality  1.  This  is  an  upper  bound  on  the 
number  of  iterations  taken  by  Step  3.  It  is  sufficient  to  find  k  such  that 

ln(3)  +  ln(m)  —  ln(n  —  k)  +  I>  (1  -  rri)  = 

i— 0  ^  ' 

Using  ln(l  -  x)  =  —x  -  0(x'2)  if  |m [  <  1,  the  above  can  be  written 


ln(3)  +  ln(rn)  -  ln(n  -  *)  -  g  (jfTj  +  0  ((^Tjja))  = 

Using  J2iZ o(l/(n  “  *))  =  in(n)  +  T  +  ©(1  /n)  and  o(l/(n  -  i )2)  =  0(1  /(n  -  k)),  we  have 

ln(3)  +  ln(m)  -  ln(n  —  k)  -  3ln(n)  +  31n(n  -  k)  +  0(l/(n  -  k))  =  0 
ln(3)  +  ln(m/n)  -  2(ln(n)  —  ln(n  -  k))  +  0(l/(n  -  k))  =  0 
ln(3)  +  ln(/3)  -  ln((n/(n  -  k))2)  +  0(l/(n  -  k))  =  0 

ln((n/(n-fe))2/3)  +  ln(e®(1/(n-fc»)  =  ln(/3) 

(n/(n-k))2/ 3  =  /3e-®U/(n-k)) 

k  =  n-n/y/3j3  (for  large  n). 


The  size  of  Q  just  before  beginning  Step  4  is,  following  steps  similar  to  those  above,  no  greater  than 

fc-i 


m 


i= 0 


n  i- 


n  —  i 


rrt 


m 


3/2’ 


where  k  =  n  ^1  —  ■  Therefore,  the  total  number  of  atoms  used  in  the  hitting  set  is  no  greater 

k  +  m/^/?)3/2  =  n 


than 


n  |  fin 

vw+JWI~2 
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n  (  n 

7P+ 337V^ 


Hence,  7>i(m, n)  <  1  — 

5  Is  a  better  approximation  algorithm  possible? 

Given  the  negative  results  on  approximation  algorithms  for  Hitting  Set  and  other  hard  optimization 
problems  found  in  [9],  the  question  arises  whether  an  approximation  algorithm  for  Hitting  Set 
satisfying  the  requirements  of  Theorem  3  is  possible.  In  [9]  it  is  shown  that  an  approximation 
algorithm  for  Hitting  Set  with  approximation  ratio  less  than  clog(n),  for  some  c  close  to  1,  is 
unlikely.  However,  3-Hitting  Set  is  MAX  SNP-hard  which  means  it  can  be  approximated  with 
constant  factor  in  polynomial  time.  Moreover,  it  is  felt  that  the  guaranteed  constant  factor  should 
be  close  to  1  ([11]).  Hence,  the  existence  of  such  an  approximation  algorithm  is  likely. 
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